human command
Grounded Reinforcement Learning: Learning to Win the Game under Human Commands
We consider the problem of building a reinforcement learning (RL) agent that can both accomplish non-trivial tasks, like winning a real-time strategy game, and strictly follow high-level language commands from humans, like "attack", even if a command is sub-optimal. We call this novel yet important problem, Grounded Reinforcement Learning (GRL). Compared with other language grounding tasks, GRL is particularly non-trivial and cannot be simply solved by pure RL or behavior cloning (BC). From the RL perspective, it is extremely challenging to derive a precise reward function for human preferences since the commands are abstract and the valid behaviors are highly complicated and multi-modal. From the BC perspective, it is impossible to obtain perfect demonstrations since human strategies in complex games are typically sub-optimal. We tackle GRL via a simple, tractable, and practical constrained RL objective and develop an iterative RL algorithm, REinforced demonstration Distillation (RED), to obtain a strong GRL policy. We evaluate the policies derived by RED, BC and pure RL methods on a simplified real-time strategy game, MiniRTS. Experiment results and human studies show that the RED policy is able to consistently follow human commands and achieve a higher win rate than the baselines. We release our code and present more examples at https://sites.google.com/view/grounded-rl.
Interleaved LLM and Motion Planning for Generalized Multi-Object Collection in Large Scene Graphs
Yang, Ruochu, Zhou, Yu, Zhang, Fumin, Hou, Mengxue
Household robots have been a longstanding research topic, but they still lack human-like intelligence, particularly in manipulating open-set objects and navigating large environments efficiently and accurately. To push this boundary, we consider a generalized multi-object collection problem in large scene graphs, where the robot needs to pick up and place multiple objects across multiple locations in a long mission of multiple human commands. This problem is extremely challenging since it requires long-horizon planning in a vast action-state space under high uncertainties. To this end, we propose a novel interleaved LLM and motion planning algorithm Inter-LLM. By designing a multimodal action cost similarity function, our algorithm can both reflect the history and look into the future to optimize plans, striking a good balance of quality and efficiency. Simulation experiments demonstrate that compared with latest works, our algorithm improves the overall mission performance by 30% in terms of fulfilling human commands, maximizing mission success rates, and minimizing mission costs.
- North America > United States (0.14)
- Asia > China > Hong Kong (0.04)
Grounded Reinforcement Learning: Learning to Win the Game under Human Commands
We consider the problem of building a reinforcement learning (RL) agent that can both accomplish non-trivial tasks, like winning a real-time strategy game, and strictly follow high-level language commands from humans, like "attack", even if a command is sub-optimal. We call this novel yet important problem, Grounded Reinforcement Learning (GRL). Compared with other language grounding tasks, GRL is particularly non-trivial and cannot be simply solved by pure RL or behavior cloning (BC). From the RL perspective, it is extremely challenging to derive a precise reward function for human preferences since the commands are abstract and the valid behaviors are highly complicated and multi-modal. From the BC perspective, it is impossible to obtain perfect demonstrations since human strategies in complex games are typically sub-optimal.
OceanPlan: Hierarchical Planning and Replanning for Natural Language AUV Piloting in Large-scale Unexplored Ocean Environments
Yang, Ruochu, Zhang, Fumin, Hou, Mengxue
We develop a hierarchical LLM-task-motion planning and replanning framework to efficiently ground an abstracted human command into tangible Autonomous Underwater Vehicle (AUV) control through enhanced representations of the world. We also incorporate a holistic replanner to provide real-world feedback with all planners for robust AUV operation. While there has been extensive research in bridging the gap between LLMs and robotic missions, they are unable to guarantee success of AUV applications in the vast and unknown ocean environment. To tackle specific challenges in marine robotics, we design a hierarchical planner to compose executable motion plans, which achieves planning efficiency and solution quality by decomposing long-horizon missions into sub-tasks. At the same time, real-time data stream is obtained by a replanner to address environmental uncertainties during plan execution. Experiments validate that our proposed framework delivers successful AUV performance of long-duration missions through natural language piloting.
- Asia > China > Hong Kong (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York > Richmond County > New York City (0.04)
- (4 more...)
HGIC: A Hand Gesture Based Interactive Control System for Efficient and Scalable Multi-UAV Operations
Hu, Mengsha, Li, Jinzhou, Jin, Runxiang, Shi, Chao, Xu, Lei, Liu, Rui
As technological advancements continue to expand the capabilities of multi unmanned-aerial-vehicle systems (mUAV), human operators face challenges in scalability and efficiency due to the complex cognitive load and operations associated with motion adjustments and team coordination. Such cognitive demands limit the feasible size of mUAV teams and necessitate extensive operator training, impeding broader adoption. This paper developed a Hand Gesture Based Interactive Control (HGIC), a novel interface system that utilize computer vision techniques to intuitively translate hand gestures into modular commands for robot teaming. Through learning control models, these commands enable efficient and scalable mUAV motion control and adjustments. HGIC eliminates the need for specialized hardware and offers two key benefits: 1) Minimal training requirements through natural gestures; and 2) Enhanced scalability and efficiency via adaptable commands. By reducing the cognitive burden on operators, HGIC opens the door for more effective large-scale mUAV applications in complex, dynamic, and uncertain scenarios. HGIC will be open-sourced after the paper being published online for the research community, aiming to drive forward innovations in human-mUAV interactions.
- North America > United States > Ohio > Portage County > Kent (0.04)
- North America > United States > New York > Broome County > Binghamton (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > Nevada > Washoe County > Reno (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination
Liu, Jijia, Yu, Chao, Gao, Jiaxuan, Xie, Yuqing, Liao, Qingmin, Wu, Yi, Wang, Yu
AI agents powered by Large Language Models (LLMs) have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination. LLM-powered agents typically require invoking LLM APIs and employing artificially designed complex prompts, which results in high inference latency. While this paradigm works well in scenarios with minimal interactive demands, such as code generation, it is unsuitable for highly interactive and real-time applications, such as gaming. Traditional gaming AI often employs small models or reactive policies, enabling fast inference but offering limited task completion and interaction abilities. In this work, we consider Overcooked as our testbed where players could communicate with natural language and cooperate to serve orders. We propose a Hierarchical Language Agent (HLA) for human-AI coordination that provides both strong reasoning abilities while keeping real-time execution. In particular, HLA adopts a hierarchical framework and comprises three modules: a proficient LLM, referred to as Slow Mind, for intention reasoning and language interaction, a lightweight LLM, referred to as Fast Mind, for generating macro actions, and a reactive policy, referred to as Executor, for transforming macro actions into atomic actions. Human studies show that HLA outperforms other baseline agents, including slow-mind-only agents and fast-mind-only agents, with stronger cooperation abilities, faster responses, and more consistent language communications.
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > New York (0.04)
- North America > Montserrat (0.04)
SayTap: Language to Quadrupedal Locomotion
Tang, Yujin, Yu, Wenhao, Tan, Jie, Zen, Heiga, Faust, Aleksandra, Harada, Tatsuya
Simple and effective interaction between human and quadrupedal robots paves the way towards creating intelligent and capable helper robots, forging a future where technology enhances our lives in ways beyond our imagination [1, 2, 3]. Key to such human-robot interaction system is enabling quadrupedal robots to respond to natural language instructions as language is one of the most important communication channels for human beings. Recent developments in Large Language Models (LLMs) have engendered a spectrum of applications that were once considered unachievable, including virtual assistance [4], code generation [5], translation [6], and logical reasoning [7], fueled by the proficiency of LLMs to ingest an enormous amount of historical data, to adapt in-context to novel tasks with few examples, and to understand and interact with user intentions through a natural language interface. The burgeoning success of LLMs has also kindled interest within the robotics researcher community, with an aim to develop interactive and capable systems for physical robots [8, 9, 10, 11, 12, 13]. Researchers have demonstrated the potential of using LLMs to perform high-level planning [8, 9], and robot code writing [11, 13]. Nevertheless, unlike text generation where LLMs directly interpret the atomic elements--tokens--it often proves challenging for LLMs to comprehend low-level robotic commands such as joint angle targets or motor torques, especially for inherently unstable legged robots necessitating high-frequency control signals. Consequently, most existing work presume the provision of high-level APIs for LLMs to dictate robot behaviour, inherently limiting the system's expressive capabilities. We address this limitation by using foot contact patterns as an interface that bridges human instructions in natural language and low-level commands.
Collaborative Bimanual Manipulation Using Optimal Motion Adaptation and Interaction Control
Wen, Ruoshi, Rouxel, Quentin, Mistry, Michael, Li, Zhibin, Tiseo, Carlo
This work developed collaborative bimanual manipulation for reliable and safe human-robot collaboration, which allows remote and local human operators to work interactively for bimanual tasks. We proposed an optimal motion adaptation to retarget arbitrary commands from multiple human operators into feasible control references. The collaborative manipulation framework has three main modules: (1) contact force modulation for compliant physical interactions with objects via admittance control; (2) task-space sequential equilibrium and inverse kinematics optimization, which adapts interactive commands from multiple operators to feasible motions by satisfying the task constraints and physical limits of the robots; and (3) an interaction controller adopted from the fractal impedance control, which is robust to time delay and stable to superimpose multiple control efforts for generating desired joint torques and controlling the dual-arm robots. Extensive experiments demonstrated the capability of the collaborative bimanual framework, including (1) dual-arm teleoperation that adapts arbitrary infeasible commands that violate joint torque limits into continuous operations within safe boundaries, compared to failures without the proposed optimization; (2) robust maneuver of a stack of objects via physical interactions in presence of model inaccuracy; (3) collaborative multi-operator part assembly, and teleoperated industrial connector insertion, which validate the guaranteed stability of reliable human-robot co-manipulation.
Report: China 'Slaughterbots' Can Kill Without Human Command
In a dangerous AI "arms race," China is exporting killer drone weapons and pilotless aircraft with AK-47 rifles to combat zones in the Middle East, Asia, and Africa, U.K.'s The Sun reported. Nicknamed "slaughterbots" in the report, the stealth weapons can deploy a targeted strike from the air "without a human pressing the fire button," per The Sun, citing a report by the U.S.'s Center for a New American Security (CNAS). "Though many current generation drones are primarily remotely operated, Chinese officials generally expect drones and military robotics to feature ever more extensive AI and autonomous capabilities in the future," the think tank's Gregory C. Allen claims, per the report. "Chinese weapons manufacturers already are selling armed drones with significant amounts of combat autonomy." The report pointed to a "Blowfish A2 drone" advertised as having "full autonomy all the way up to targeted strikes," according to Allen.
- Asia > China (1.00)
- Europe > Middle East (0.28)
- Asia > Middle East (0.28)
- (3 more...)
- Government > Military (1.00)
- Information Technology > Robotics & Automation (0.61)
Indian student creates chip that can help phones respond faster to human commands
As the world increasingly embraces artificial intelligence, there is a growing demand for devices to process facial recogniition and respond faster to speech-to-text and human commands. This would involve either expensive infrastructure to support natural language processing (NLS) or machine learning on the device itself or bandwidth to send data to the cloud for decision-making. However, smaller devices do not have the power or infrastructure needed to run the data through complex neural networks - a mesh of information servers on which machine learning algorithms are trained - locally on the devices and hence have to send the data to the cloud, which increases response times. Now, an Indian-origin student at the Massachusetts Institute of Technology (MIT) in the US has developed a new processor chip that can speed up data processing on neutral networks by 3-7 times while reducing power consumption by 94-95%. This means that smaller devices such as smartphones or smart home appliances could run NLS or face recognition locally, and in turn, respond faster to human commands.